Skip to content

Conversation

@mjschmidt271
Copy link
Collaborator

This adds the AT2 workflow to test on SNL machines.

Currently, the only testing we do with this workflow is for GPU/cuda on H100 cards.

This autotesting workflow is triggered by pull requests to main and by a handful of actions on an open PR to main (e.g., reopened, converted to ready from draft, etc.).

A couple of notes (as I understand things--could be mistaken in minor ways):

  • If a PR contains changes to the .github directory, a member of the snl-testing-admins team must add the CI-AT2_special_approval to kick off the autotesting.
  • Otherwise, any PR that is submitted by a member of the snl-testing team, and only contains commits from members of that team should automatically trigger this autotesting.
  • In the case that the PR is submitted by someone who is not a member of the snl-testing team or contains commits from someone outside of that team, an approving review by someone on the snl-testing team is required to trigger autotesting.

Given the above, I have not been able to test the latter 2 conditions because my changes are not yet merged into main

@mjschmidt271
Copy link
Collaborator Author

@jaelynlitz - It probably makes sense to also remove the PNNL workflow with this PR, so let me know if that works for you

@@ -0,0 +1,36 @@
name: SNL-AT2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should name this 'AT' only?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the software product that facilitates this is "AT2 (Autotester2)"--however, what we call it on our side doesn't make a bit of difference to me

@jeff-cohere
Copy link
Collaborator

Is this test failure an issue for this PR, or is it a known and unrelated thing?

The following tests FAILED:
	 10 - mam4_nucleate_ice_unit_tests (Failed)

Comment on lines 20 to 26
uses: actions/checkout@v4
with:
persist-credentials: false
show-progress: false
submodules: recursive
- name: Cloning Haero
uses: actions/checkout@v3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the same version of checkout across the actions? 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤯

@@ -0,0 +1,104 @@
name: gcc-cuda
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What versions of gcc and cuda are we using?

-DNUM_VERTICAL_LEVELS=72 \
-DENABLE_COVERAGE=OFF \
-DENABLE_SKYWALKER=ON \
-DCMAKE_CUDA_ARCHITECTURES=90 \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is a var above I'm inclined to reference the var over a hard coded value

@jaelynlitz
Copy link
Collaborator

@jaelynlitz - It probably makes sense to also remove the PNNL workflow with this PR, so let me know if that works for you

I can remove it in a subsequent PR so we don't delay yours being merged

@jaelynlitz
Copy link
Collaborator

One more thought @mjschmidt271 if it makes sense to add a README like we had for PNNL CI? Seems like this was less complicated though

@mam4xxSNL
Copy link
Collaborator

mam4xxSNL commented Apr 15, 2025

Is this test failure an issue for this PR, or is it a known and unrelated thing?

The following tests FAILED:
	 10 - mam4_nucleate_ice_unit_tests (Failed)

@odiazib @singhbalwinder It looks to me like it may be addressed by currently open PRs--is that correct?

Well, it appears to have been a one-off and is now passing 🧐

@codecov
Copy link

codecov bot commented Apr 16, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.68%. Comparing base (429f0a8) to head (4883778).
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #426      +/-   ##
==========================================
- Coverage   95.69%   95.68%   -0.01%     
==========================================
  Files          46       46              
  Lines        9446     9438       -8     
==========================================
- Hits         9039     9031       -8     
  Misses        407      407              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jeff-cohere
Copy link
Collaborator

Is this test failure an issue for this PR, or is it a known and unrelated thing?

The following tests FAILED:
	 10 - mam4_nucleate_ice_unit_tests (Failed)

@odiazib @singhbalwinder It looks to me like it may be addressed by currently open PRs--is that correct?

Well, it appears to have been a one-off and is now passing 🧐

Hrmm! Well, here's what was reported, in case it rears its head again:

10: /home/runner/work/mam4xx/mam4xx/src/tests/mam4_nucleate_ice_unit_tests.cpp:203: FAILED:
10:   {Unknown expression after the reported line}
10: due to a fatal error condition:
10:   SIGFPE - Floating point error signal

@singhbalwinder
Copy link
Contributor

Thanks a lot, Mike, for adding this testing pipeline! As @jaelynlitz mentioned, we can remove PNNL's pipeline in a separate PR. We might want to run both pipelines for a few PRs before removing PNNL's pipeline just to monitor both pipelines.

@mam4xxSNL mam4xxSNL merged commit 6aca623 into main Apr 16, 2025
12 checks passed
@mam4xxSNL mam4xxSNL deleted the mjs/snl-autotester branch April 16, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants